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Abstract 

We compare variants of Anderson Mixing with the Jacobian-Free Newton-Krylov and Broyden methods applied to 
the k-eigenvalue formulation of the linear Boltzmann transport equation. We present evidence that one variant of 
Anderson Mixing finds solutions in the fewest number of iterations. We examine and strengthen theoretical results of 
Anderson Mixing applied to linear problems. 



1. Introduction 

The k-eigenvalue formulation of the linear Boltzmann transport equation is widely used to characterize the criti- 
cality of fissioning systems 0~1[2). Physically, the largest eigenvalue, generally denoted by k, is the effective neutron 
multiplication factor that, in the equation, scales the fission production term to achieve a steady-state solution. The 
corresponding eigenmode describes the neutron flux profile for that steady-state (i.e., critical) system and, when the 
system is close to criticality, provides useful information about the distribution of the neutron population in space and 
velocity [ 1 1. Mathematically, the equation is a standard eigenproblem for which power iteration is well-suited because 
the eigenmode of interest is most commonly that with the largest magnitude [3 |. For the deterministic k-eigenvalue 
problem each step of a true power iteration incurs a heavy computational cost due to the expense of fully inverting the 
transport operator, therefore a nonlinear fixed point iteration is generally employed in which an approximate inversion 
of this operator is performed at each iteration. In addition to power iteration, the Implicitly Restarted Arnoldi Method 
has been applied to this problem and has the advantage of being able to compute additional eigenmodes [4] . However, 
the transport matrix must be fully inverted at each iteration, diminishing its computational efficiency and attractiveness 
when only the dominant eigenmode is desired. 

Recently, more sophisticated nonlinear iteration methods employing approximate inversion, predominantly Jacobian- 
Free Newton-Krylov (JFNK), have been applied with great success [5 6 7]. However, there has not yet been a com- 
prehensive comparison of nonlinear solvers. This paper presents such a comparison, examining the performance of 
three nonlinear solvers — JFNK, Broyden's Method and Anderson Mixing — applied to the k-eigenvalue problem. A 
variant of Anderson Mixing |8 1, first described in |9 1, is of particular interest because, in the experience of the authors, 
it is frequently computationally more efficient than JFNK and Broyden's method. 

JFNK is an inexact Newton's method in which the inversion of the Jacobian is performed to arbitrary precision 
using a Krylov method (most commonly GMRES) and the Jacobian itself is never formed, but rather its action is 
approximated using finite differences of arbitrarily close state data. JFNK can be expected to converge quadratically 
in a neighborhood containing the solution (cf. IflOl and the references therein). Each iteration of JFNK requires a 
nested 'inner' iteration and the bulk of the computational effort is expended in this 'inner' Krylov inversion of the 
Jacobian at each 'outer' Newton step. At the end of each inversion, the accumulated Krylov space is discarded even 
though the Jacobian is expected to change minimally during the final Newton steps when a similar space will be rebuilt 
in the next Newton iteration. In effect, at the end of each iteration, JFNK discards information that may be of use in 
successive iterations. 
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In its standard formulation Broyden's method (cf. ifTTl ). like low memory BFGS (cf. [12|), uses differences in 
state from successive iterations to make low rank updates to the Jacobian. The Sherman-Morrison- Woodbury up- 
date rule is then used to compute the action of the inverse of the Jacobian after such an update. While Broyden's 
method is restricted to low-rank updates, it provides an explicit representation of the Jacobian allowing one to employ 
the Dennis-More condition |[T3l to show that it converges super-linearly in a neighborhood containing the solution. 
Further, it has been shown to solve linear problems of size N in at most 2N iterations (cf. [11| and the references 
therein.) 

Anderson Mixing [8| uses differences in state from successive iterations to infer information about the inverse 
of the Jacobian, which is assumed to be roughly constant in a neighborhood containing all the iterates. Unlike 
Broyden's method, the updates can be of arbitrary rank. Recent results by Walker and Ni Ifl4l show that, with mild 
assumptions, Anderson Mixing applied to a linear problem performs as well as the generalized minimum residual 
method (GMRES) fl5l . In this regard, Anderson Mixing may be thought of as a nonlinear version of GMRES. In 
independent work, Carlson and Miller formulated a so-called nonlinear Krylov acceleration [9| method which we 
show to be a variant of Anderson Mixing. Further, we examine the hypotheses of a central theorem presented by 
Walker and Ni and argue that they will, with high probability, be met and that they can be omitted entirely if one is 
willing to accept a small performance penalty. While Anderson Mixing performs best for our numerical experiments, 
there is no theory that the authors of this paper know of that can characterize its performance for nonlinear problems. 

The rest of this paper is organized as follows: In Section[2]we discuss some of the analysis of Anderson Mixing, 
motivate an implementation choice and elaborate on the results of Walker and Ni in 1 14 1. In Section[3]we describe the 
k-eigenvalue problem. Our numerical results can be found in Section |4] and we conclude with an overview. 

2. Background of Anderson Mixing and Nonlinear Krylov Acceleration 

2.1. Nonlinear Krylov Acceleration 

In (9] [16] Carlson and Miller outlined an iterative method, dubbed nonlinear Krylov acceleration or NKA, for 
accelerating convergence of fixed-point iteration by using information gained over successive iterations. The problem 
they consider is to find a root of the function R N 3n f(x) € R N . One approach is to apply a fixed-point iteration of 
the form 

X„+i = x„ - /(x„). (1) 

Fixed-point iterations can be viewed in the context of an approximate Newton iteration where the derivative of /, 
denoted Df, is replaced by the identity /. The motivation behind NKA is instead to approximate Df using information 
from previous iterates, improving that approximation over successive iterations, and in cases where no applicable 
approximation is available, revert to a fixed-point iteration where Df is replaced with /. 

NKA requires an initial guess xo and at the « th invocation provides an update y„ + \ that is used to derive the n + 1 st 
iterate from the « th . This method may be written as 

v„ + i= NKA[f(x„), . . .] 

where NKA[f(x n ), . . .] is the update computed by the nonlinear Krylov accelerator. We use the brackets and ellipsis 
to indicate that NKA is stateful and draws on previous information when computing an update. 

On its first invocation (n = 0) NKA has no information and simply returns /(xo). At iteration n > it has access 
to the M vectors of differences for some natural number M e (0, n), 

V,- = Xi-\ - Xj and w; = /(x*_i) - f(xj) for i = n — M + 1, . . . ,n, 

and where, for convenience, we shall define 'Wn to be the span of the w, vectors available at iteration n. If / has a 
constant and invertible derivative, Df, then we would have 

Df\i = Wj and D/" 1 w i = V;. (2) 
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We denote by P<w n the operator that projects orthogonally onto the subspace W„ and write the identity 

/(X„) = PwJ(Xn) + (/(X„) - P<wJ(*n)). 

Note that /(x„) - P<w„f(?n) is orthogonal to "W„. If the w,- vectors are linearly independent, then there is a unique set 
of coefficients z (,,) := (zf\zf\ . . . ,z£ n) ) e R" so that 

n 

i=n-M+ 1 

and hence by Eq. |2) 

Df-^J(x n ) = Df- i J] J] zfV«, 

i=«-M+l i=n-M+\ 

The idea motivating Carlson and Miller is to project /(x„) onto "TV,, (the space where the action of Df~ l is known), 
compute that action on the projection and, for lack of information, apply a fixed-point update given in Eq. ([T| on the 
portion of /(x„) that is orthogonal to "W,,. The resulting formula for x„ +1 is 



v„+i = 



X«+l = X n — V n +l, 



2 /<w- ^ zfv,- 



i=n-M+ 1 



where the vector of coefficients in the orthogonal projection, z (n) , is the solution to the projection, alternatively mini- 
mization, problem 



i> ' = arg mm 



/(x„) - ^ y,w, 

i=n-M+l 

2.2. jVKA fli Anderson Mixing 

Carlson and Miller had essentially rediscovered, albeit in a slightly different form, the iterative method presented 
much earlier by Anderson in [8 1, which is now commonly referred to as Anderson Mixing or Anderson Acceleration. 
Anderson was studying the problem of finding a fixed point of some function R N 3 x — > G(x) e R N . In Section 4 
of 1 8 1 he defines a fixed point residual for iteration i as 

r, = G(Xj) - x ; , 

which can be used to measure how x, fails to be a fixed point. With this Anderson proposed an updating scheme of 
the forn£] 



x«+i — x n 



M 



J] if ) (x„ - x„_i) - A r n - J] zf[r„ - r n _,] 



-!=] V ,= 1 

where the vector of coefficients, z (n) e R M , is chosen to minimize the quantity 

M 



i=1 



and where Anderson requires that /?„ > 0. Here Anderson considers a depth of M difference-vectors. 



Anderson uses 6" to denote the r* update coefficient of the n* iterate, where we have written t£. in keeping with the presentation of NKA. 
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The problem of finding a root of / may be recast as a fixed point problem by defining G(x) = x - /(x), and then 
r, = -/(x,). If, for each n, one defines the vectors 



~<n) 

v 



x„_; and W; 



/(x„) - /(x„_i) forz' = 1, 
then the Anderson Mixing update becomes 



x„+i 



L i=l 



!=1 



where 



z (n) = arg min yeR , 



Note that spanfWj , 



/(x„)-^3',w; 

[=i 



(») 



, w 



M 1 



span{w„_ji 



Anderson Mixing are related by the formula 

M 



;'=n-i+l 



, w„) = *W„. In particular the update coefficients of NKA and 



(3) 



Further, for a constant and invertible derivative, D/ _1 w^' !) = v . With this it is clear that, for linear problems, NKA 
is equivalent to Anderson Mixing when /3„ = 1 ; the difference is only in the choice of basis vectors that span TVj 
resulting in a change of variables given by Eq. (|3J. One advantage of the NKA formulation is that it uses differences 
of successive function evaluations to form a basis for "W n . These differences can be used for M successive iterations. 
In contrast, Anderson's formulation forms a basis for 1V„ using differences of the most recent function evaluation 
with all previous evaluations, which must be recomputed at every iteration. 

2.3. Analytic Examinations of Anderson Mixing Applied to a Linear Problem 

In 04]], Walker and Ni consider the behavior of a generalized Anderson Mixing algorithm when it is applied 
to linear problems and when all previous vectors are stored, i.e. when M = n. They prove that, under modest 
assumptions, a class of variants of Anderson Mixing are equivalent to GMRES. They consider Anderson Mixing as a 
means to find a fixed-point of G. Like Anderson they compute r, = G(x,) - x,- at each step. 

In their presentation of Anderson Mixing, though, they compute the update coefficients^] 



z (n) = arg mm 



^y,r,- subject to Jj^i"' = ^ 



i=0 



which are then used to form the update 



x„+i = ^2- n) G(x ; ). 



Again choosing G(x) = x - /(x) (and hence r, = -/(x,)) and noting that the constraint requires that 

a- 1 



i=0 



Walker and Ni used fj and where we, for consistency, are using r j and z l "' . 
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one has, as noted by Walker and Ni, that the above is equivalent to Anderson's original formulation 



z w = arg min R „ 



n-1 



/(xJ-^yiC/CxJ-ZCx,)) 



i=0 



and 



Xn+1 — X n 



n-1 



J] z< n) (x„ - x,-) + /(x„) - J] z< n) (/(x„) - /(X,)) 

/=() V i=0 



Because Anderson Mixing and NKA are equivalent to each other and to the Walker and Ni formulation, we may 
restate one of Walker and Ni's theorems as follows: 

Theorem 2.1 (Walker and Ni |fl4j). Suppose that we search for a root of the function /(x) = Ax - b starting with 
Xo using either the NKA update or Anderson's update with fj„ — 1. Suppose further that A is non-singular. Let 
x gmres denote the n 'h it era t e generated by GMRES applied to the problem Ax = b with starting point Xo and let 

+ and Wr™^ || 2 < || r G «* ra || 2 



v cmres _ ^ x cmres _ jj denote the associated residual. If for some n > 0, 
/or all < j < n, then the n + 1 iterate generated by either update rule is given by x„ + i 



n-1 



(I - A)x^ M «" + ft. 



It should be noted that the original presentation of the theorem given in [14| applies to a class of orthogonal 
projection methods where we have presented the theorem as applied to two members of this class: Anderson's original 
method and the NKA variant. It should also be noted that this theorem of Walker and Ni shows that 



nv„ = A7C„, 

where TCi = span{r ( ), Aro, . . . , A" _1 ro}. 
An immediate corollary is 

Corollary 2.2. Let r„ = Ax„ — b denote the residual associated with the n' h iterate generated by either Anderson 
Mixing or NKA. Under the assumptions ofTheorem \2.1\ 

\\r n+1 \\2<\\l-M\\^ MRES h, 
where || • || is the L 2 operator norm. 
Proof. 

r„+i = Ax„+i - b 

A [(I - A)x^ MRES +b]-b 

= Ax° MRES - b - A(Ax^ MRES - b) 
(I - A)(Ax° MRES - b), 

from which the claim follows. □ 



The value of Corollary 2.2 is that, in the linear case, convergence of the non-truncated versions of both NKA and 
Anderson Mixing have the same characterizations as GMRES, i.e. when considering a normal matrix A, the residual 
is controlled by the spectrum of A, provided as GMRES doesn't stagnate. 



2.4. Non-stagnation of GMRES 

When considering a linear problem, the coefficients z ( "' (Anderson Mixing) and z (w) (NKA) do two things. 
Through a minimization process, they select the best approximation within a given subspace, and they also select 
the subspace for the next iteration. The failure mode is that the best approximation within a given subspace doesn't 
use the information from the most recent iteration, in which case the next subspace will be the same as the current 
one and Anderson Mixing and NKA will become trapped. Anderson briefly discusses this in his presentation of the 
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methocj^] Lemma 2.1.5 from Ni's dissertation ifTTll addresses this more directly. What Walker and Ni recognize in ifMl 
is that this failure mode corresponds to the stagnation of GMRES. 

There have been several examinations of when GMRES stagnates. Zavorian, O'Leary and Elman in lfl8ll and 
Greenbaum, Ptak and Strakos in |fT9l present examples where GMRES does not decrease the norm of the residual 
on several successive iterations, i.e. it stagnates. While GMRES converges for such cases, Anderson Mixing and 
NKA will not. Greenbaum and Strakos show in |20| that the residual for GMRES applied to a matrix A is strictly 
decreasing, i.e. no stagnation, if (r, Ar) + for all r satisfying ||r||2 + 0. This in turn will ensure the convergence of 
Anderson Mixing and NKA. 

It is important to bear in mind that, in practice, the subspace < W„ is likely to have modest dimension, say 10, and 
is sitting in where N is much larger, often on the order of thousands or millions. The slightest perturbation of a 
vector will, with probability one, push it off of this low dimensional space. Even so, there is a simple change to the 
update step that provides a theoretical guarantee that NKA will still converge even when GMRES stagnates. This 
guarantee comes at a slight performance cost and is accomplished by modifying the update coefficients to ensure that 
*W„-: c <W„. 

We consider a modification to the NKA update rul^] 



x„+i 



i=l V i=l 



where z (n) is chosen to minimize 



/(x„)-£w,.zf» 



We now add the following safety check: if ||w„||2 + 0, then perform the following update 

||/(x„)-z? =1 w4 ra) || 2 



Jri) 



l|W„|| 2 



(n) 

for some positive e, where one chooses to add or subtract based on which will make the modified version 

further from zero. As will be shown in the proof of Theorem [23] the numerator of the modification is zero only when 

NKA has found the root. 

With this we can strengthen Corollary |2.2| as it applies to NKA as follows: 

Theorem 2.3. Let A be non-singular square matrix and suppose that we search for a root of the function /(x) = Ax— b 
starting with Xq using the modified version of NKA. Let r^ MRES = Ax^ MRES - b denote the n' h GMRES residual and 
let r„ denote the n' h NKA residual, then 

l|r„ + i||2<(l+e)IH-A||||r^ MR£S || 2 . 



The proof can be found in Appendix A The limitation of this result is that the orthogonal projection will become 



more poorly conditioned as s decreases. Note that for the same reason both Anderson Mixing and NKA can develop 
arbitrarily poorly conditioned projection problems. 



2.5. Relationship Between Anderson Mixing and Broyden 

Fang and Saad in [21 1 provide a comparison between several methods including Anderson Mixing and Broyden. 
As noted above, the central difference between the two is that Broyden uses secant information to compute low rank 
updates to the approximation of Df, where Anderson Mixing uses the same information to compute arbitrary rank 
updates to Df~ l . Fang and Saad clarify this relationship and place a variant of Broyden and Anderson Mixing in a 
broader class of multi-secant update methods. 



3 This need to expand the space over which one is searching necessitates /?„ # 0. 
4 The following could be adapted to many formulations of Anderson Mixing 
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Fang's and Saad's presentation of Anderson Mixing (cf. Eq. (24) in [21 1) in the notation adopted in this paper is 

/(x„) - J w^ n) ] , 

.... i=n-M+\ ' 

where 

n 

/(xj- 2] w * • 

i=n-M+l 2 

and where /? is chosen to be positive. 

It is worth noting that Fang and Saad have presented a version of Anderson Mixing that is more closely related 
to Carlson's and Miller's NKA method, but where NKA would require that, in the above formulation, /? = — 1. This 
leads to the question what is the best value of /? for a given problem, or in the case of Anderson's presentation what 
is the best choice for/?,, = -/?. We report numerical results for/? = — 1 (the NKA formulation) and /? = 1 (denoted 
NKA-x). 

2.6. NKA in Practice 

For our experiments we shall use the NKA formulation of Anderson Mixing. We shall let yS = 1 and /? = -1, as 
defined by Fang and Saad. Each w, vector is rescaled to have unit-norm, and the associated v, vector is scaled by the 
same factor. In our implementation we use a Cholesky Factorization to solve the least-squares problem associated 
with the orthogonal projection. This requires that the Gramian of the W; vectors be positive definite. We enforce this 
by choosing a linearly independent subset of the w, vectors as follows: At the beginning of iteration n we start with 
w„ and consider W„_i. If the angle between w„ and w„_i is less than some tolerance we discard W„_i. We iterate in 
this manner over successively older vectors, keeping them if the angle they make with the space spanned by the kept 
Wj vectors is greater than the tolerance. This ensures that we may use Cholesky factorization and that the condition 
number of the problem is bounded. 

Memory constraints require that M < n, forcing us to use NKA, NKA_i and Broyden in a setting for which there 
are no theoretical results that the authors of this paper know of. One important distinction between Broyden and NKA 
is that for each iteration NKA stores a pair of state vectors, while Broyden only stores one. Consequently the memory 
requirements for Broyden are half that of NKA for the same depth of stored vectors. 

Depending on the tolerance chosen at each linear solve for JFNK, one can reasonably expect theoretical guarantees 
of quadratic convergence to hold in some neighborhood of the solution, however identifying that neighborhood is often 
considerably harder than solving the original problem. In summary, for the numerical experiments we present, there 
is little theory regarding performance, making the following numerical results of value. 



x„+i 



z 



z ( "> = arg min R « 



3. The k-Eigenvalue Formulation of the Boltzmann Transport Equation 

The problem that we use to compare the efficiency of these various nonlinear methods is the k-eigenvalue formu- 
lation of the linearized Boltzmann neutron transport equation. 

[£l + E)]i(r(?,£l,E) = j dE' j £/0%(r,£" -> E,Cl' ■Cl)if/(r,Q.,E') + ^- j dE'x(E' -> E)vL f {f,E')<j>{f,E'). 

(4) 

The unknown iff describes the neutron flux in space, r e R 3 , angle Cl e S 2 and energy E. Z, is the total cross section, 
E, is the scattering cross section and Sf is the fission cross section. The quantities v and^- characterize the rate and 
energy spectrum of neutrons emitted in the fission process. Integrating the flux over angle gives <p, the scalar flux 
at a given position and energy. For a thorough examination of the mathematical models of fission, including the 
Boltzmann transport equation, cf. |T][2]]. 

Discretization of Eq. Q is accomplished using 
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Sn (discrete ordinates) in angle: An S 4 level symmetric quadrature set is chosen and the solution is computed 
at the abscissas of that set and then integrated over angle using the quadrature weights. 

Multigroup in energy: Cross sections can be very noisy and it is therefore not practical to discretize the energy 
variable as one would a continuous function. Instead, the energy space is divided up into groups and it is 
assumed that the energy component for a cross section <x can be separated out: 

aif,E) ~ f(E)o- g (r), E g < E < E g -\ 

g- l dE<r{?,E) 



Jj- 1 dEf(E) 

3. Finite element or difference in space: The spatial variable can be treated in a variety of different ways. Here we 
explore results from two different discretization strategies as employed by the Los Alamos National Laboratory 
production transport codes PARTISN |22| and Capsaicin. PARTISN is used to generate results on a structured 
mesh with diamond (central) difference and Capsaicin to generate results on an unstructured polygonal mesh 
using discontinuous finite elements. 

Applying the and multigroup approximations, the linearized Boltzmann equation takes the form 

G , G 



(o,„ • v + ^(fO) ^ g Jf) = ^ J] o-,^ g (?Wf(?) + ~ vcr fA^ Xs '2l r (5) 

s'=i s'=i 

Here, 



• i// gtin represents the angular flux in direction Cl m in energy group g, which is the number of neutrons passing 
through some unit area per unit time ( — , ,. # , ) 

° r \cmr-Mev-ster-sec ) 

• <p g is the scalar flux, or angle-integrated angular flux. The Sn quadrature used to define angular abscissas (Cl m ) 
can be used to integrate if/ over all angle: f = Yim=\ w m4'g,m where w m are the quadrature weights ( cm 2 ;ec ) 

• cr, g is the total cross section, or interaction probability per area, for group g (^) 

• cr Sig >-, g is the 'inscatter' cross section, which is the probability per area that a neutron will scatter from group g' 
into group g(Jjr) 

• 07 ? is the fission cross section for group g (;JpJ 

• v is the mean number of neutrons produced per fission event 

• Xg'^g describes the energy spectrum of emitted fission neutrons in group g produced by neutrons absorbed from 
group g' 

Application of the spatial discretization then yields a matrix equation. For convenience, this equation can be expressed 
in operator notation as 

Lift = MSDi/r + -MFDtA. (6) 

k 

where L is the streaming and removal operator, S is the scattering operator, F is the fission operator, and M and D are 
the moment-to-discrete and discrete-to-moment operators, respectively, and Di^ = (p. 

If the fluxes are arranged by energy group, the form of L is block diagonal and it can be inverted onto a constant 
right-hand-side exactly by performing a so-called sweep for each energy group and angular abscissa. The sweep is an 
exact application of L~' and can be thought of as tracking the movement of particles through the mesh along a single 
direction beginning at the incident boundaries, which vary by angle, so that all necessary information about neutrons 
streaming into a cell from other 'upwind' cells is known before it is necessary to make calculations for that cell. 
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Most transport codes contain the mechanism to perform the sweep, but could not apply L directly without significant 
modification, therefore Eq. |6]) is generally thought of as a standard eigenproblem of the form 

k(f> = (i - DL^MS) - ' DL~'MF0. 
Because we seek the mode with the largest magnitude eigenvalue, power iteration is one possible iterative technique: 

Z+ 1 = (i - DL 1 MS)~ ' DL 1 MFcp z 

_ W T F<p z+1 
2+1 ~ W T Fcf> z 

Full inversion of (i-DL-'MS) is expensive, however, so it is generally more efficient to use a nonlinear fixed 

point iteration (FPI) in which the operator (i - DL _1 MS) is inverted only approximately using a nested inner-outer 
iteration scheme to converge the scattering term. Commonly, one or more 'outer iterations' are conducted in which 
the fission source is lagged. Each outer iteration consists of a series of inner 'within-group' iterations (one or more 
per energy group) in which the inscatter is lagged so that the within-group scattering can be converged. In general, 
the iteration can be written as 



= k, 



W r F0 z+ i 
W T F<f> z 



(8a) 
(8b) 



where W is a vector of weights and generally represents a volume integral. The k-eigenvalue can be thought of as the 
ratio of neutrons in successive generations and new neutrons are created in fission events, therefore it makes sense to 

adjust k using the ratio of fission sources. If P(k z ) — (l - DL~'MS) DL 'MF we recover a true power iteration, 
but there are numerous possible forms for this operator. It is typically more efficient to limit the number of sweeps, 
however. The most basic approach is to apply L~' via a sweep. In light of the fact that we are lagging the scattering 
altogether, we also consider another update for the eigenvalue that accounts for the change in the scattering as well. 



= DL 



■m|s + ^-f 

jT 



k- 



z+i 



W F(p z+ 



W T (lF<t> z -S(<t> z+1 -<t> z j) 



(9a) 



(9b) 



Clearly, if we assume that the scattering is converged, then the second term in the denominator of Eq. ( |9b] l is zero and 
we recover Eq. ( |8b] >. For a converged system, this term will indeed be zero and the ratio of the fission sources will go 
to one because the fission source has been suitably adjusted by £ so that the net neutron production is zero. 
From this we have that a fixed point of the iteration in Eq. |9]l is also a root of the residual function 



P(k)<f> 

y r FO(<) 
w T F<p 



(10) 



where 



P(*) 



I-DL 'MIS+ tF 



In order to initialize the iteration, a single sweep is performed on a vector of ones, E 
two-norm of the flux is then normalized to one: 



[11... l] r , and the scaled 



00 = 

k = 



DL" 1 M(S + F)£ 

IIdl-'mcs + F)£|L 



l 
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Here || • ||2 iS indicates the L 2 -norm normalized by the square root of the number of degrees of freedom. Once the 
residual is formulated, it is possible to apply any of the nonlinear solvers discussed to seek a root of F given in 
Eq. ([TO). The following section contains a comparison of JFNK, Broyden, NKA_i and NKA for the k-eigenvalue 
problem. 

4. Results 

Results are given for a cylinder with a 3.5 cm radius and a height of 9 cm modeled in two-dimensional cylindrical 
coordinates, similar to the cylindrical problem that was studied in |4| in three-dimensional Cartesian coordinates. The 
problem consists of a central 5-cm layer of Boron- 10 with 1 cm thick water layers on either side and 1 cm layers of 
highly enriched uranium on the ends. The top, bottom and radial boundaries are either all vacuum or all reflective (the 
inner radial boundary is a symmetry condition in cylindrical coordinates). This a 'difficult' problem (i.e., one with a 
high dominance ratio) because the problem is symmetric, having two fissile regions that are effectively 'decoupled' 
because of the reflector (water) and absorber (boron) between them, which means the second eigenmode is close to the 
fundamental mode; reflection further increases the difficulty of the problem. A 16-group Hansen-Roach cross section 
data set is used to generate the results presented here (the 16 group data was collapsed to 5 groups in the original 
paper |4)). 

A 175 (r-axis) by 450 (z-axis) mesh of equally sized squares is used in PARTISN for both the reflected and 
unreflected problems. The Capsaicin results are computed on an unstructured mesh comprising roughly the same 
number of cells in the r and z axes as the PARTISN computation, for a total of 79855 (possibly non-convex) polygons, 
each of which has 3 to 6 vertexes on a cell. These codes use similar methods, but there are several differences that lead 
to different iterative behavior and slightly different results. Firstly, PARTISN is an orthogonal mesh code that utilizes 
a diamond (central) difference (DD) spatial discretization, requiring the storage of only one cell-centered value per 
energy group (2). Capsaicin uses an unstructured mesh, which has the advantage of being able to model geometrically 
complex problems with higher fidelity, but it comes at a cost. Finding an efficient sweep schedule on an unstructured 
mesh that minimizes the latency in parallel computations is a difficult problem in itself 11231 l24l [25). In contrast, 
a parallel sweep schedule that is nearly optimal is easy to implement for structured meshes |26| . Furthermore, a 
discontinuous finite element method (DFEM) spatial discretization is employed in Capsaicin such that the number of 
unknowns per cell is equal to the number of nodes Il27l . While the DFEM has better accuracy than DD, potentially 
enabling the use of commensurately fewer mesh cells for the same solution accuracy, it is more costly to compute the 
solution to the DFEM equations than the DD equations, and the DFEM is thus less efficient than DD when calculations 
are performed on meshes of similar size. 

The second difference between the two codes is that reflecting boundary conditions can cause instabilities with the 
DD spatial discretization and the angular differencing used in PARTISN so, in order to achieve stability, the reflected 
flux is 'relaxed' . This is done using a linear combination of the new and old reflected fluxes as the boundary source 
for the next iteration: 

^relaxed ^^refl ^ 

The relaxation parameter, r, is, by default, \ for this problem. This relaxation is not necessary in Capsaicin because 
it uses a more accurate spatial and angular discretization, including the use of starting directions for calculations in 
cylindrical coordinates [2], as well the use of reflected starting directions when reflection conditions are specified 
on the radial boundary. The eigenvalues computed by PARTISN are k — 0.1923165 and k = 0.8144675 for the 
unreflected and reflected cases, respectively. Because the codes employ different discretizations, the eigenvalues 
calculated by Capsaicin were k — 0.191714 for the unreflected case and k — 0.814461 for the reflected case. The 
Capsaicin discretization results in roughly 50 million degrees of freedom while the PARTISN discretization results in 
a little over 5 million degrees of freedom. 

The three nonlinear solvers that we consider in this paper are implemented in the NOX nonlinear solver package 
that is part of the larger software package Trilinos 10.6 |28]. We use JFNK as it is implemented in NOX and have writ- 
ten implementations of both NKA and Broyden's method in the NOX framework as user supplied direction classes. 
These two direction classes are available for download on Sourceforge at http://sourceforge.net/projects/ 



10 



nlkain/ In addition to the existing JFNK interface, interfaces to the NOX package were developed for the Broy- 
den and NKA methods so that all methods except fixed-point iteration are accessed through the same Trilinos solver. 
JFNK can be extremely sensitive to the choice of forcing parameter, 77, therefore we explored variations of the three 
possibilities implemented in NOX: 



Constant: r] z 
Type 1 (EW1): t] z 



no 



\\F Z \\ - |I7 z -i<J z _i + F z . 



Type 2 (EW2): 77- = y 



F.-ll 



(12a) 

1 + Vs ( 1+V5 "I 

If tj _ \ > .1, then 77. <— max -j 77 z , 77, 2 j >. (12b) 
If T^j-i > •!> tnen ?7 Z < — max{?7 z , y?7^ j }. (12c) 



Types 1 and 2 were developed by Eisenstat and Walker 1291 , therefore they are denoted EW1 and EW2 in the results 



that follow. The NOX default parameters were also used for EW1 and EW2: 770 = 10 , 77 mi „ 
a — 1.5 and y — 0.9. Note that for the results below, the convergence criterion is W\a,s < 10" 9 . 



10- b , 77,, 



10- 



4.1. Unreflected Cylinder 
4.1.1. PARTISN 

Tables Ta|[Td show the number of JFNK and total inner GMRES iterations, the number of sweeps, the CPU time 
and the percentage of that time that was spent computing the residual, respectively. As can been seen in Tables [Ta| and 
lb the GMRES subspace size does not affect the number of Newton iterations for the most part and has little effect 
on the total number of GMRES iterations or sweeps. As the forcing parameter decreases, however, more sweeps are 
required because more GMRES iterations are required, both overall and per Newton step. Smaller subspace sizes 
require more restarts, which in turn each require one additional sweep. In general, for NKA, NKA with the j3 — 1 
(NKA_i) and Broyden, decreasing the subspace size leads to an increase in sweep count, but the effect is negligible 
for NKA in this case, while it is significant for NKA_i and Broyden. Broyden(lO) also requires many more iterations 
than Broyden(5) in contradiction to the general trend. NKA_[ also failed to converge for a subspace size of 5. Overall, 
NKA requires the fewest sweeps and displays a predictable trend of decreasing sweep count with increasing subspace 



size. As Table lc shows, the runtimes are consistent with the sweep count. NKA is more than 3 times faster than FPI 

-1 is, at best, comparable to JFNK and, at worst, more than 
-at best, it is almost as efficient as 



and between 1.6 and 2.3 times as fast as JFNK, while NKA_ 
an order of magnitude slower than FPI. The behavior of Broyden is rather chaotic- 
NKA, but at worst it too is slower than FPI. 

The percentage of the total time spent evaluating the residual is shown in Table Id because, for this particular code 
and problem, the solver time requires a non-trivial portion of the run-time. As can be seen, this percentage is slightly 
smaller for NKA and NKA_i than Broyden and noticeably smaller for Broyden than for JFNK. There is also a slight 
decrease as the subspace size increases in all cases, which is expected since the respective solvers must do more work 
for a larger subspace than a smaller one. JFNK requires the least amount of time in the solver with approximately 
95% of the run-time spent in the sweep regardless of GMRES subspace size. Despite these numbers, we note that 
NKA still requires the least run-time of all of the methods. 

The plots in Fig. [I] show the scaled L 2 -norm of the residual as a function of the number of sweeps. For JFNK 
where there are a few outer Newton iterations with multiple sweeps required to do the inner inversion of the Jacobian, 
the L 2 -norm is given at each Newton step plotted at the cumulative sweep count up to that point. Fig. 



La 



shows that 

the behavior of NKA is similar for all subspace sizes and much more efficient than FPI. Fig.fTblshows NKA_i, which 



converges fairly quickly for the larger restarts, but then has trouble reducing the residual for subspaces of 5 and 10. 



While 10 eventually converges, 5 seems to stagnate. Fig. lc shows the behavior of Broyden. As can be seen, the norm 
fluctuates quite erratically for every subspace size except 30 and cannot compete with NKA with a subspace of 5. 
Fig. [id] shows some of the more efficient JFNK results plotted at each Newton iteration for the current sweep count. 



And finally, Fig. le shows a comparison of the most efficient results for each method. 
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Table 1: PARTISN unreflected cylinder: Number of outer and inner JFNK iterations, sweeps, run-time and percentage 
of the run-time spent computing the residual to an accuracy of ||F||2,j < 10~ 9 for the various methods. 

(a) Outer JFNK/total inner GMRES iterations 

subspace Q1 q.01 q.001 EW1 EW2 

30 8(30) 6 (42) 5 (45) 6(39) 5 (38) 

20 8 (30) 6 (42) 5 (45) 6(39) 5 (38) 

10 8 (30) 6 (42) 5 (48) 6(41) 5 (42) 

5 8 (30) 6 (48) 5 (50) 5 (43) 5 (44) 



(b) Number of sweeps (FPI converged in 99) 



JFNK 77 
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(c) CPU time (s) (FPI converged in 245.8 s) 
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(d) Percentage of CPU time spent in the residual evaluation 
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Figure 1: PARTISN unreflected cylinder: Scaled L 2 -norm of the residual as a function of number of sweeps for 
the various methods and subspace sizes. Each of the methods is plotted on the same scale to simplify comparisons 
between panels. Note that, in panel (d), points plotted on the lines indicate when a JFNK iteration starts (JFNK 
requires multiple sweeps per iteration). In panels (a), (b), and (c) we plot one point per two iterations of the method. 
In panel (e) the convention for iterations per plotted points is the same as in panels (a) through (d). 
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4.1.2. Capsaicin results 

The different spatial and angular discretization methods used in Capsaicin alter the numerical properties of the 
operators associated with the k-eigenvalue problem, affecting the convergence rates and curves of the various solution 
methods. While the methods in Capsaicin have better accuracy than those in PARTISN, they are also more costly 
due to the additional degrees of freedom associated with the DFEM spatial discretization and the use of starting 
directions in the angular discretization. This is evident in the relatively long run times associated with the Capsaicin 
calculations, even though they were computed with a parallel decomposition in energy, in addition to the parallel mesh 
decomposition, on 24 processors. Because of other differences in implementation between PARTISN and Capsaicin, 
a minimum of 99% of the computation time is spent in the evaluation of the residual (for all solution methods) and 
therefore is not reported in the tables. 

Tables 2apc show the number of JFNK and total inner GMRES iterations, the number of sweeps and the CPU 
time that was spent computing the residual, respectively. As can been seen in Tables 2a and 2b as for PARTISN the 
GMRES subspace size does not affect the number of Newton iterations for the most part and as the forcing parameter 
decreases more sweeps are required. However, the subspace size does affect the total number of GMRES iterations, 
hence the sweep count, for the smaller and varying forcing parameters as was not seen with PARTISN. Once again, for 
NKA, NKA_! and Broyden, decreasing the subspace size leads to an overall increase in sweep count and Broyden(lO) 
requires many more iterations than Broyden(5) in contradiction to the general trend. NKA_i failed to converge for 
subspace sizes of 5 and 10 where PARTISN only failed for 5. Once again, NKA requires the fewest sweeps and 
displays a predictable trend of decreasing sweep count with increasing subspace size. Table 2c demonstates that once 
again the runtimes are consistent with the sweep count. NKA is more than 6 times faster than FPI and between 1.8 
and 3.6 times as fast as JFNK for comparable subspace sizes, while NKA_] is comparable to JFNK when it converges. 
The behavior of Broyden is, once again, chaotic, at times almost as efficient as NKA, but at worst slower than JFNK, 
although in this case it is always more efficient than FPI. 

The plots in Fig. [2] show the scaled L 2 -norm of the residual as a function of the number of sweeps. The behavior 
is similar to that seen in Fig. [I] FPI experiences an abrupt change in slope between ||-F||2, S = 10~ 8 and 10~ 9 that is 
not seen in the PARTISN results and leads to a much larger iteration count. It is also interesting to note that in the 
NKA_i results, shown in Fig. [2b] subspaces of 5 and 10 seem to stagnate for large numbers of iterations instead of 



consistently dropping to zero as they do for NKA in Fig. 2a 
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Table 2: Capsaicin unreflected cylinder: Number of outer and inner JFNK iterations, sweeps and run-time spent 
computing the residual to an accuracy of ||F||2, S < 10~ 9 for the various methods. 

(a) Outer JFNK/total inner GMRES iterations 

n 

subspace Q1 q.OI 0.001 EW1 EW2 

30 8(35) 6 (47) 5 (49) 5 (37) 5 (39) 

20 8 (35) 6 (47) 5 (49) 5 (37) 5 (39) 

10 8 (35) 6 (60) 5 (62) 5 (51) 5 (52) 

5 9 (62) 6 (66) 5 (68) 6 (85) 5 (53) 



(b) Number of sweeps (FPI converged in 202) 
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(c) CPU time (ks) (FPI converged in 12.71 ks) 














JFNK 77 






subspace 


NKA 


NKA_! 


Broyden 


0.1 


0.01 


0.001 


EW1 


EW2 


30 


1.9 


3.3 


2.4 


3.4 


4.2 


3.9 


3.4 


3.4 


20 


1.9 


3.6 


3.4 


3.4 


4.1 


4.0 


3.5 


3.3 


10 


2.1 




8.2 


3.5 


5.0 


5.1 


4.7 


4.3 


5 


2.1 




4.5 


5.7 


5.9 


5.9 


7.6 


4.7 



15 



Figure 2: Capsaicin unreflected cylinder: Scaled L 2 -norm of the residual as a function of number of sweeps for 
the various methods and subspace sizes. Each of the methods is plotted on the same scale to simplify comparisons 
between panels. Note that, in panel (d), points plotted on the lines indicate when a JFNK iteration starts (JFNK 
requires multiple sweeps per iteration). In panels (a), (b), and (c) we plot one point per two iterations of the method. 
In panel (e) the convention for iterations per plotted points is the same as in panels (a) through (d). 
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4.2. Fully Reflected Cylinder 
4.2.1. PARTISN results 

Here, we duplicate the results shown in 4. 1 . 1 only this time, reflection is added to the top, bottom and outer edges 
of the cylinder. Tables 3a||3d show the number of JFNK and total inner GMRES iterations, the number of sweeps, 
the CPU time and the percentage of that time that was spent computing the residual, respectively. As can be seen in 
Tables [3a] and [3b] increasing the subspace size has a non-negligible effect on Newton iteration count, and a significant 
effect on GMRES iteration count and sweep count. One additional parameter that comes into play for this problem is 
the maximum number of inner GMRES iterations allowed. This parameter was set to the NOX default of 30 for these 
computations and in many cases the inner iteration did not converge to the specified tolerance because it exceeded this 
limit. We note that, in our experience, increasing this upper bound tends to degrade the performance of JFNK, just 
as decreasing the forcing parameter often degrades the performance. For NKA and Broyden, decreasing the subspace 
size also leads to an increase in sweep/iteration count, quite noticeably in this case. The only subspace size that 
converged for NKA_! was 30, which requires more sweeps than any other iterative method including FPL For smaller 
subspace sizes, Broyden also fails to converge. When comparing the runtimes shown in Table 3c for all of the iterative 
methods, we see that JFNK with GMRES(5) is only slightly more efficient than FPI, while JFNK with GMRES(30) 
is comparable to NKA(5). Broyden, when it converges, is slower than NKA, but comparable to JFNK. The runtime 
for NKA_i is an order of magnitude greater than any other method. Table [3d] shows that the percentage of time spent 
in the residual is ranges from approximately 95% for JFNK with GMRES(5) to approximately 90% for GMRES(30). 
The percentage is smallest for NKA and NKA_i, and slightly larger for Broyden. Once again, we note that despite 
the fact that a larger percentage of time is spent in the NKA solver than in JFNK or Broyden, it is still more efficient 
in terms of CPU time. 

The plots in Fig.[3]show the scaled L 2 -norm of the residual as a function of the number of sweeps. Fig. [3a] shows 
that the behavior of NKA is similar for all subspace sizes and much more efficient than FPI in all Fig. |3b| shows 
that NKA_i behaves erratically for subspaces of 5 and 10, and at 20 seems to stagnate before reaching the specified 
convergence criterion. Even when it does converge for a subspace size of 30, it is slower even than FPI. Fig. 3c shows 
the behavior of Broyden. As can be seen, the norm fluctuates erratically and, although it appears to be converging 
initially for subspaces of 5 and 10, it never achieves an error norm below 10~ 8 and eventually diverges. Broyden(30) 



and Broyden(20), which converge to the desired norm, cannot compete with NKA(20). Fig. 3d shows some of the 



more efficient JFNK results plotted at each Newton iteration for the current sweep count. And finally, Fig. 3e shows 
a comparison of the most efficient results for each method. Clearly, NKA(20) is significantly more efficient than the 
other iterative methods, although we note that NKA(10) is comparable to JFNK with GMRES(20). 
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Figure 3: PARTISN reflected cylinder: Scaled L 2 -norm of the residual as a function of number of sweeps for the 
various methods and subspace sizes. Each of the methods is plotted on the same scale to simplify comparisons 
between panels. Note that, in panel (d), points plotted on the lines indicate when a JFNK iteration starts (JFNK 
requires multiple sweeps per iteration). In panels (a), (b), and (c) we plot one point per six iterations of the method. 
In panel (e) the convention for iterations per plotted points is the same as in panels (a) through (d). 
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Table 3: PARTISN reflected cylinder: Number of outer and inner JFNK iterations, sweeps, run-time and percentage 
of the run-time spent computing the residual to an accuracy of ||F||2,j < 10~ 9 for the various methods. 



(a) Outer JFNK/total inner GMRES iterations 
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(b) Number of sweeps (FPI converged in 389) 
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(c) CPU time (s) (FPI converged in 971.7 s) 
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(d) Percentage of CPU time spent in the residual evaluation 
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4.2.2. Capsaicin results 

Here, we duplicate the results shown in 4. 1 .2 only this time, reflection is added to the top, bottom and outer edges 
of the cylinder. Tables |4a||4c| show the number of JFNK and total inner GMRES iterations, the number of sweeps and 
the CPU time that was spent computing the residual, respectively. As can be seen in Tables 3a and 3b increasing the 
subspace size does not affect the Newton iteration count, but it does have a significant effect on the GMRES iteration 
count and sweep count. In this regard, the behavior is more similar to the unreflected case than the PARTISN reflected 
case. For NKA and Broyden, there is an overall increase in iteration count, but for Broyden it is slight compared 
to that seen for the unreflected cylinder and the reflected PARTISN results. Also, for Capsaicin, Broyden always 
converges whereas for PARTISN only subspaces of 20 and 30 converged. The only subspace size that converged for 
NKA_j was 30, which is consistent with the PARTISN results, and which once again requires more sweeps than any 
other iterative method including FPL When comparing the runtimes shown in Table 4c for all of the iterative methods, 
we see that JFNK with GMRES(5) is generally slower than FPI, while JFNK with GMRES(30) is slower than NKA 
with a subspace of 5. Broyden is slower than NKA, but always more efficient than JFNK. The runtime for NKA_i is 
twice that for FPI. 

The plots in Fig. [3] show the scaled L 2 -norm of the residual as a function of the number of sweeps. Fig. 



4a 



shows 

that the behavior of NKA is similar for all subspace sizes and much more efficient than FPI while Fig.Bblshows that 
NKA_i seems to stagnate before reaching the specified convergence criterion subspaces of 5 and 10, and at 20 . Even 
when it does converge for a subspace size of 30, it is slower even than FPI. Fig. 4c shows the behavior of Broyden. 
As can be seen, the convergence is fairly regular, in sharp constrast to the PARTISN results where the norm fluctuated 
erratically and often failed to converge. Fig. 4d shows some of the more efficient JFNK results plotted at each Newton 
iteration for the current sweep count. And finally, Fig. 3e shows a comparison of the most efficient results for each 
method. Clearly, NKA(20) is more efficient than the other iterative methods. 
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Figure 4: Capsaicin reflected cylinder: Scaled L 2 -norm of the residual as a function of number of sweeps for the 
various methods and subspace sizes. Each of the methods is plotted on the same scale to simplify comparisons 
between panels. Note that, in panel (d), points plotted on the lines indicate when a JFNK iteration starts (JFNK 
requires multiple sweeps per iteration). In panels (a), (b), and (c) we plot one point per two iterations of the method. 
In panel (e) the convention for iterations per plotted points is the same as in panels (a) through (d). 
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Table 4: Capsaicin reflected cylinder: Number of outer and inner JFNK iterations, sweeps and run-time spent com- 
puting the residual to an accuracy of ||F||2, S < 10~ 9 for the various methods. 
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(b) Number of sweeps (FPI converged in 99) 
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(c) CPU time (ks) (FPI converged in 6.25 ks) 
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5. Conclusion 



The benefit of JFNK is its "Newton-like" convergence achieved by developing an arbitrarily accurate approxima- 
tion of the inverse of the Jacobian at each 'outer' iteration, but the cost is repeated function evaluations in each of 
these 'outer' iterations and the wasteful discarding of potentially useful subspace information. In contrast Broyden 
and Anderson Mixing only perform a single function evaluation at each iteration, but continue to use 'old' information 
from previous iterations to improve their estimate of the Jacobian. The drawback for these methods is that the approx- 
imate Jacobian or its inverse is based on an amalgam of new and old information, so they are unlikely to converge in 
fewer iterations than Newton's method. Performance of all these methods will clearly depend on how the Jacobian 
is changing from iteration to iteration, and how information collected at each function evaluation is used. Memory 
requirements for Broyden are half that of NKA making it an attractive alternative. 

Analytic examinations in 1 14 1 show that variants of Anderson Mixing, including NKA, behave like a Rrylov 
method (such as GMRES) when applied to most linear problems. Further, the hypotheses of non-stagnation of GM- 
RES can be removed at the cost of a modest performance penalty. 

Our numerical results indicate that Anderson Mixing in the form of NKA found solutions in the PARTISN and 
Capsaicin codes, for both the reflected and unreflected problem, in the fewest number of function evaluations and 
the shortest runtimes. These problems contain real material properties and real geometries, and were run at a large 
computational scale. Our results highlight the strength of this method: regularity, consistency and efficiency. In our 
results, NKA was shown to bring the norm down to zero smoothly, much as FPI and JFNK do, but with greater 
efficiency than those methods. Broyden and NKA_i, while they at times achieved excellent performance, did not 
always demonstrate this same smooth convergence behavior and often diverged. Based on these results we feel that 
NKA may be well-suited to other computational physics problems beyond neutron transport. 
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Appendix A. Proof of Theorem 2.3 

Proof of Theorem \2.3\ From the linearity of the problem w, = Av,, and from the update rule x ;l+ i = x„ - V„+i, we have 
x„ = x - £" = i v >- and then 



11 




from which we have 



n 



/(*») - J] 



(«) 

z 'w,- = r - 



A^d+zfV/. 



i=1 



We use the above to compute the n + 1 residual using the unmodified NKA update 



r«+i - 



Ax„ + i - b 
Ax„ - b - Av„ + i 



Ax„ - b - A [XU Mzf + /(x„) - XU w^ ] 
= r - A E» , v,- - A [Zti + r <> - A EtiC 1 + 
(I - A) (r - A 2^(1 
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If, for the modified update, ||w„||2 + then the « th coefficient is updated as follows 

™)|| IL A v« n i J n h 



z„ ± e n — n = z„ ± s- 



l|W„|| 2 



The n th residual becomes 
( 



r n+1 = (I-A) 



!=1 



l|Av„|| 



|ro-AE? =1 (l+z}"> 
HAv„|| 2 



-Av„ 



from which we have 

l|r„ + i|| 2 < ||I -A|| 



r -AZ? =1 (l+ Z ^K±e 



l|Av„|| 



(1 + e )||I-A||||r () -A2:; , = 1 (l +zr J )Vi|| 2 , 
where || • | denotes the operator 2-norm. Let z (n) = (zf\ ■ ■ ■ , zi ) denote the solution to the minimization problem 



arg min yeR „ 
and let 'V,, = span{vi, . . 



/(X„) - ^ w iyt 



1=1 



arg mm,. 



r -A2j(l +yi)v,' 



, v„), then the above gives 



||r„ +1 || 2 <(l+e)||I-A||min||r -Av|| 2 . 

veV n 



(A.l) 



If the modification of z n is not performed, then the factor of (1 + e) would simply be one. In either case the above 
bound on ||r M+ i||2 given in Eq. ( A.l i holds. 

We now turn our attention to the question of when the Krylov spaces (7C„ := span{ro, Aro, . . . , A" _1 ro}) are 
expanding with n. In what follows we assume ro + 0, i.e. that our initial guess xo is not a solution. Let M be the 
lowest natural number so that 'Km+x £ 'Km- Then the vectors r ( >, Ar , . . . , A M ~'ro are linearly independent and there 
is a unique set of coefficients ao, . . . , %_i such that 



A M r = J] a < A ' r °' 



If oro = 0, then we would have 

M-2 

A M - l r - J] «/ + iA'r 

and, because A is non-singular, this would imply that 'Km c 7Cm-i contradicting the definition of M. As such we have 



ro 



A M r„ - J] a,A'r 





r 

1 


= A 


a 



M-2 



A^'ro - ^ a M A'r 

i=0 



and hence a solution. This allows us to conclude if we haven't found a solution to the problem within 'Km, then 
"Km c 'Km+i- 

It now suffices to show that *V„ = 7C„. We begin by noting *Vi = span{vj} = span{ro) = TCi, and, because A is of 
full rank, Wi = Avj = Aro + 0. Our inductive hypothesis is that there is some natural number n, e.g. n — 1, so that 
w„ + and that for all natural numbers j < n, 'Vj = ftj and Ax ; - b + 0. That we haven't found a solution in the first 
n iterations is sufficient to conclude from above arguments that 'Kx c . . . c "7C n+ i. 
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The NKA update rule, where the modification to has already been applied (because w„ + 0), may be written 



as 



v B+ i = 



Zti^ + /(*-)- 22=1 w^ B) 

Sti^ + ro-A^.vKl+zf) 
[(1 + z\ 1} )ro] - Ar (l + for n = 1 (recall v, = r ) 

[Zti v ^"' + r o - A 2"=7 + z[ n) )] - Av„(l + z* n) ) for n*\ 



In the case n = 1, V2 = (1 + z\ )(ro - Aro). Our modification ensures that 1 + Zj l; ^ 0. If ro — Aro, then NKA has 
found the solution X] = xo - ro. We have excluded this by assumption and so ^2 = span{vi , V2} = spanjro, Aro) = 'Ki 
and W2 = AV2 + 0. Our inductive hypothesis holds for n — 2 

For the cases n + 1, we know that 'Vn-i = 'Kn-x- This implies that any linear combination of Vi, . . . , v„_i lies 
in 7C„_i, and so any linear combination of Avi, . . . , Av„_i lies within < K n . We conclude that the quantity in brackets 
in the last of the preceding lines lies within < K n . Since x„ is not the solution, spanjvi, . . . ,\ n -i) = 'Kn-i Q K n = 
spanjvi, . . . , v„). This is sufficient to conclude that v„ may be written as the sum v + A" _1 ro, where \ e "V^-i = < K n -\- 
By construction (1 + z^) + and so if \ n+l = 0, then 7C„ + i Q < K n and hence x„ +1 will be a solution. Since we assume 
x„ + i is not the solution, our inductive hypothesis holds for n + 1. □ 
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